Using Confidence Bounds for Exploitation-Exploration Trade-offs
نویسنده
چکیده
We show how a standard tool from statistics — namely confidence bounds — can be used to elegantly deal with situations which exhibit an exploitation-exploration trade-off. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitation-versus-exploration decisions based on uncertain information provided by a random process. We apply our technique to two models with such an exploitation-exploration trade-off. For the adversarial bandit problem with shifting our new algorithm suffers only Õ ( (ST ) ) regret with high probability over T trials with S shifts. Such a regret bound was previously known only in expectation. The second model we consider is associative reinforcement learning with linear value functions. For this model our technique improves the regret from Õ ( T 3/4 ) to Õ ( T 1/2 ) .
منابع مشابه
Learning and innovation: Exploitation and exploration trade-offs☆
a r t i c l e i n f o This paper examines the relationship between learning and innovation outcomes, focusing on the trade-off between exploitation and exploration in learning and innovation. The study identifies two types of learning and two outcomes of innovation. Exploitation and exploration in learning are inversely associated with innovation rates and impact. While exploitative, localized ...
متن کاملBalance Within and Across Domains: The Performance Implications of Exploration and Exploitation in Alliances
Organizational research advocates that firms balance exploration and exploitation, yet it acknowledges inherent challenges in reconciling these opposing activities. To overcome these challenges, such research suggests that firms establish organizational separation between exploring and exploiting units or engage in temporal separation whereby they oscillate between exploration and exploitation ...
متن کاملEthical Perspective: Five Unacceptable Trade-offs on the Path to Universal Health Coverage
This article discusses what ethicists have called “unacceptable trade-offs” in health policy choices related to universal health coverage (UHC). Since the fiscal space is constrained, trade-offs need to be made. But some trade-offs are unacceptable on the path to universal coverage. Unacceptable choices include, among other examples from low-income countries, to expand coverage for services wit...
متن کاملOn multilabel classification and ranking with bandit feedback
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We sho...
متن کاملOn Multilabel Classification and Ranking
We present a novel multilabel/ranking algorithm working in partial information settings. The algorithm is based on 2nd-order descent methods, and relies on upper-confidence bounds to trade-off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where covariates can be adversarial, but multilabel probabilities are ruled by (generalized) linear models. We sho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 3 شماره
صفحات -
تاریخ انتشار 2002